Search CORE

701 research outputs found

Geodesic Distance Histogram Feature for Video Segmentation

Author: A Kundu
EH Taralova
F Galasso
P Krähenbühl
T Brox
T Brox
T Leung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/03/2017
Field of study

This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

arXiv.org e-Print Archive

Crossref

Structure, dynamics and bifurcations of discrete solitons in trapped ion crystals

Author: Brox J.
Landa H.
Mielenz M.
Reznik B.
Schaetz T.
Publication venue: 'IOP Publishing'
Publication date: 02/09/2013
Field of study

We study discrete solitons (kinks) accessible in state-of-the-art trapped ion experiments, considering zigzag crystals and quasi-3D configurations, both theoretically and experimentally. We first extend the theoretical understanding of different phenomena predicted and recently experimentally observed in the structure and dynamics of these topological excitations. Employing tools from topological degree theory, we analyze bifurcations of crystal configurations in dependence on the trapping parameters, and investigate the formation of kink configurations and the transformations of kinks between different structures. This allows us to accurately define and calculate the effective potential experienced by solitons within the Wigner crystal, and study how this (so-called Peierls-Nabarro) potential gets modified to a nonperiodic globally trapping potential in certain parameter regimes. The kinks' rest mass (energy) and spectrum of modes are computed and the dynamics of linear and nonlinear kink oscillations are analyzed. We also present novel, experimentally observed, configurations of kinks incorporating a large-mass defect realized by an embedded molecular ion, and of pairs of interacting kinks stable for long times, offering the perspective for exploring and exploiting complex collective nonlinear excitations, controllable on the quantum level.Comment: 25 pages, 10 figures, v2 corrects Fig. 2 and adds some text and reference

arXiv.org e-Print Archive

MPG.PuRe

Learning to Extract Motion from Videos in Convolutional Neural Networks

Author: BKP Horn
D Fleet
D Fortun
DJ Butler
DJ Heeger
EH Adelson
F Solari
GW Taylor
KG Derpanis
KG Derpanis
NC Rust
T Brox
T Brox
V Ulman
YA LeCun
Publication venue
Publication date: 27/01/2016
Field of study

This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

arXiv.org e-Print Archive

Crossref

Point-wise mutual information-based video segmentation with high temporal consistency

Author: C Xu
E Parzen
L Vincent
P Arbeláez
P Ochs
RC Gonzalez
T Brox
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we tackle the problem of temporally consistent boundary detection and hierarchical segmentation in videos. While finding the best high-level reasoning of region assignments in videos is the focus of much recent research, temporal consistency in boundary detection has so far only rarely been tackled. We argue that temporally consistent boundaries are a key component to temporally consistent region assignment. The proposed method is based on the point-wise mutual information (PMI) of spatio-temporal voxels. Temporal consistency is established by an evaluation of PMI-based point affinities in the spectral domain over space and time. Thus, the proposed method is independent of any optical flow computation or previously learned motion models. The proposed low-level video segmentation method outperforms the learning-based state of the art in terms of standard region metrics

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Video Object Detection with an Aligned Spatial-Temporal Memory

Author: A Shrivastava
B Coifman
B Lee
C Wren
N Dalal
P Viola
T Brox
W Liu
Publication venue
Publication date: 26/07/2018
Field of study

We introduce Spatial-Temporal Memory Networks for video object detection. At its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables full integration of pretrained backbone CNN weights, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. Our method produces state-of-the-art results on the benchmark ImageNet VID dataset, and our ablative studies clearly demonstrate the contribution of our different design choices. We release our code and models at http://fanyix.cs.ucdavis.edu/project/stmn/project.html

arXiv.org e-Print Archive

Crossref

Dense Motion Estimation for Smoke

Author: A Doshi
AN Strahler
C Li
D Auroux
D Garcia
DA Vila
DJ Butler
G Strang
G Wang
J Chen
J Gregson
J Steinhoff
L Xu
M Haindl
MJ Black
S Baker
T Brox
T Brox
T Corpetti
T Corpetti
T Xue
V Lakshmanan
Z Zhang
Publication venue
Publication date: 08/09/2016
Field of study

Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.Comment: ACCV201

arXiv.org e-Print Archive

Crossref

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

Author: Behrmann Nadine
Brox Thomas
Fischer Volker
Hoffmann David T.
Schrodi Simon
Publication venue
Publication date: 19/10/2023
Field of study

In this work, we study rapid, step-wise improvements of the loss in transformers when being confronted with multi-step decision tasks. We found that transformers struggle to learn the intermediate tasks, whereas CNNs have no such issue on the tasks we studied. When transformers learn the intermediate task, they do this rapidly and unexpectedly after both training and validation loss saturated for hundreds of epochs. We call these rapid improvements Eureka-moments, since the transformer appears to suddenly learn a previously incomprehensible task. Similar leaps in performance have become known as Grokking. In contrast to Grokking, for Eureka-moments, both the validation and the training loss saturate before rapidly improving. We trace the problem back to the Softmax function in the self-attention block of transformers and show ways to alleviate the problem. These fixes improve training speed. The improved models reach 95% of the baseline model in just 20% of training steps while having a much higher likelihood to learn the intermediate task, lead to higher final accuracy and are more robust to hyper-parameters

arXiv.org e-Print Archive

A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects

Author: Andres B.
Brox T.
Keuper M.
Schiele B.
Tang S.
Yu Z.
Publication venue
Publication date: 01/01/2016
Field of study

Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach

MPG.PuRe

TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

Author: GD Evangelidis
H Idrees
M Sapienza
RJ Elliott
T Brox
X Peng
Publication venue
Publication date: 01/08/2018
Field of study

Current state-of-the-art methods solve spatiotemporal action localisation by extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate sets of temporally connected bounding boxes called \textit{action micro-tubes}. However, they fail to consider that the underlying anchor proposal hypotheses should also move (transition) from frame to frame, as the actor or the camera does. Assuming we evaluate

n

2D anchors in each frame, then the number of possible transitions from each 2D anchor to the next, for a sequence of

f

consecutive frames, is in the order of

O(n^f)

, expensive even for small values of

f

. To avoid this problem, we introduce a Transition-Matrix-based Network (TraMNet) which relies on computing transition probabilities between anchor proposals while maximising their overlap with ground truth bounding boxes across frames, and enforcing sparsity via a transition threshold. As the resulting transition matrix is sparse and stochastic, this reduces the proposal hypothesis search space from

O(n^f)

to the cardinality of the thresholded matrix. At training time, transitions are specific to cell locations of the feature maps, so that a sparse (efficient) transition matrix is used to train the network. At test time, a denser transition matrix can be obtained either by decreasing the threshold or by adding to it all the relative transitions originating from any cell location, allowing the network to handle transitions in the test data that might not have been present in the training data, and making detection translation-invariant. Finally, we show that our network can handle sparse annotations such as those available in the DALY dataset. We report extensive experiments on the DALY, UCF101-24 and Transformed-UCF101-24 datasets to support our claims.Comment: 15 page

arXiv.org e-Print Archive

Crossref

A Multi-scale Bilateral Structure Tensor Based Corner Detector

Author: F. Mokhtarian
H. Wang
L. Kitchen
S.M. Smith
T. Brox
Z. Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

9th Asian Conference on Computer Vision, ACCV 2009, Xi'an, 23-27 September 2009In this paper, a novel multi-scale nonlinear structure tensor based corner detection algorithm is proposed to improve effectively the classical Harris corner detector. By considering both the spatial and gradient distances of neighboring pixels, a nonlinear bilateral structure tensor is constructed to examine the image local pattern. It can be seen that the linear structure tensor used in the original Harris corner detector is a special case of the proposed bilateral one by considering only the spatial distance. Moreover, a multi-scale filtering scheme is developed to tell the trivial structures from true corners based on their different characteristics in multiple scales. The comparison between the proposed approach and four representative and state-of-the-art corner detectors shows that our method has much better performance in terms of both detection rate and localization accuracy.Department of ComputingRefereed conference pape

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref